Search CORE

27 research outputs found

Toward optimal allocation of human resources for active learning with application to safe advertising

Author: Attenberg Josh
Provost Foster
Publication venue
Publication date: 22/12/2009
Field of study

New York University Faculty Digital Archive

Online Active Inference and Learning

Author: Foster Provost
Josh Attenberg
Publication venue
Publication date: 03/04/2020
Field of study

ABSTRACT We present a generalized framework for active inference, the selective acquisition of labels for cases at prediction time in lieu of the estimated labels of a predictive model. We develop techniques within this framework for classifying in an online setting, for example, for classifying the stream of web pages where online advertisements are being served. Stream applications present novel complications because (i) we don't know at the time of label acquisition what instances we will see, (ii) instances repeat based on some unknown (and possibly skewed) distribution. To address the complications, we combine ideas from decision theory, cost-sensitive learning, online density estimation, and we introduce a method for on-line estimation of the utility distribution that allows us to manage the budget over the stream. The resulting model tells which instances to label so that by the end of each budget period, the budget is best spent (in expectation). We test the method on streams from a real application. The main results show that: (1) our proposed approach to active inference on streams can indeed reduce error costs substantially over alternative approaches, (2) more sophisticated online estimations achieve larger reductions in error. We then discuss the setting of simultaneously conducting active inference and active learning. We argue and provide some support that our expected-utility active inference strategy also selects good examples for learning

CiteSeerX

Dominant Color Learning by Subject Extraction

Author: Aryafar Kamelia
Attenberg Josh
Condon Fiona
Publication venue
Publication date: 26/02/2013
Field of study

Presented at the Women in Machine Learning Workshop (WiML ’12), Research Poster, Lake Tahoe, Nevada, USA.Advances in the digital media industry have resulted in an exponential growth in available image data sets. This exponential growth has in turn spurred great interest in various methods for acquiring, processing, analyzing, and understanding images in order to produce numerical or symbolic information such as color and texture characteristics. Detecting the dominant color of an object in the image without any prior knowledge about the background model, the object characteristics or the scene geometry is a challenging problem. The two major challenges in assigning a dominant color to the image subject are the isolation of the subject by background subtraction and the extraction of dominant color from the approximated subject region. In this work, we combine an estimated subject mask with the image color histogram to detect the dominant image color

Drexel Libraries E-Repository and Archives

Improving fairness in machine learning systems: What do industry practitioners need?

Author: ACM.
Agarwal Alekh
Attenberg Josh
Barocas Solon
Binns Reuben
Bolukbasi Tolga
Bosch Nigel
Buolamwini Joy
Chouldechova Alexandra
DSSG.
Green Ben
Kamar Ece
Kamar Ece
Kilbertus Niki
Kleinberg Jon
Kusner Matt J
Lakkaraju Himabindu
Liu Anqi
Liu Hugo
Liu Lydia T
Lyu Lingyu
Maclellan Christopher J
Nushi Besmira
Raghavan Manish
Sculley D.
Springer Aaron
Vaughan Jennifer Wortman
Yang Qian
Zhao Zian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/01/2019
Field of study

The potential for machine learning (ML) systems to amplify social inequities and unfairness is receiving increasing popular and academic attention. A surge of recent work has focused on the development of algorithmic tools to assess and mitigate such unfairness. If these tools are to have a positive impact on industry practice, however, it is crucial that their design be informed by an understanding of real-world needs. Through 35 semi-structured interviews and an anonymous survey of 267 ML practitioners, we conduct the first systematic investigation of commercial product teams' challenges and needs for support in developing fairer ML systems. We identify areas of alignment and disconnect between the challenges faced by industry practitioners and solutions proposed in the fair ML research literature. Based on these findings, we highlight directions for future ML and HCI research that will better address industry practitioners' needs.Comment: To appear in the 2019 ACM CHI Conference on Human Factors in Computing Systems (CHI 2019

arXiv.org e-Print Archive

Crossref

Cleaning search results using term distance features

Author: Josh Attenberg
Torsten Suel
Publication venue
Publication date: 01/01/2008
Field of study

The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their results, the incentive for spammers to use increasingly sophisticated techniques has never been higher, since the commercial success of a Web page is strongly correlated to the number of views that page receives. This paper describes a term-based technique for spam detection based on a simple new summary data structure called Term Distance Histograms that tries to capture the topical structure of a page. We apply this technique as a post-filtering step to a major search engine. Our experiments show that we are able to detect many of the artificially generated spam pages that remained in the results of the engine. Specifically, our method is able to detect many web pages generated by utilizing techniques such as dumping, weaving, or phrase stitching [11], which are spamming techniques designed to achieve high rankings while still exhibiting many of the individual word frequency (and even bi-gram) properties of natural human text

CiteSeerX

Crossref